# CLIP visual encoding

Resnet101 Clip Gap.openai
Apache-2.0
ResNet101 image encoder based on CLIP framework, extracting image features through Global Average Pooling (GAP)
Image Classification Transformers
R
timm
104
0
Resnet50x64 Clip Gap.openai
Apache-2.0
CLIP model image encoder based on ResNet50 architecture with 64x width expansion, using Global Average Pooling (GAP) strategy
Image Classification Transformers
R
timm
107
0
Resnet50x16 Clip Gap.openai
Apache-2.0
A ResNet50x16 variant model based on the CLIP framework, focused on image feature extraction
Image Classification Transformers
R
timm
129
0
Resnet50x4 Clip Gap.openai
Apache-2.0
ResNet50x4 variant model based on the CLIP framework, designed for image feature extraction
Image Classification Transformers
R
timm
170
0
Vit Large Patch14 Clip 224.dfn2b
Other
A vision transformer model based on the CLIP architecture, focused on image feature extraction, released by Apple.
Image Classification Transformers
V
timm
178
0
Vit Huge Patch14 Clip 224.dfn5b
Other
A ViT-Huge image encoder based on the CLIP architecture, released by Apple as part of the DFN5B-CLIP model, suitable for visual feature extraction tasks.
Image Classification Transformers
V
timm
128
0
Vit Base Patch16 Clip 224.dfn2b
Other
Vision Transformer model based on CLIP architecture, featuring DFN2B-CLIP image encoder weights released by Apple
Image Classification Transformers
V
timm
444
0
Vit Huge Patch14 Clip 224.laion2b
Apache-2.0
ViT-Huge visual encoder based on the CLIP framework, trained on the laion2B dataset, supports image feature extraction
Image Classification Transformers
V
timm
1,969
0
Vit Base Patch32 Clip 256.datacompxl
Apache-2.0
Vision Transformer model based on CLIP architecture, specialized in image feature extraction with support for 256x256 resolution input
Image Classification Transformers
V
timm
89
0
Vit Base Patch32 Clip 224.laion2b
Apache-2.0
Vision Transformer model based on CLIP architecture, designed for image feature extraction, trained on the laion2B dataset
Image Classification Transformers
V
timm
83
0
Vit Base Patch32 Clip 224.datacompxl
Apache-2.0
Vision Transformer model based on CLIP architecture, designed for image feature extraction, trained using the DataComp XL dataset
Image Classification Transformers
V
timm
13
0
Vit Base Patch16 Clip 224.datacompxl
Apache-2.0
A vision Transformer model based on the CLIP architecture, specifically designed for image feature extraction, using ViT-B/16 structure and trained on the DataComp XL dataset
Image Classification Transformers
V
timm
36
0
Convnext Base.clip Laiona
Apache-2.0
ConvNeXt Base model based on the CLIP framework, trained on the LAION-Aesthetic dataset, suitable for image feature extraction tasks.
Image Classification Transformers
C
timm
14
0
Git Base One Piece
MIT
A vision-language model fine-tuned from Microsoft's git-base model, specifically designed to generate descriptive text captions for images from the anime 'One Piece'
Image-to-Text Transformers Supports Multiple Languages
G
ayoubkirouane
16
0
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase